Crowd-based MT Evaluation for non-English Target Languages

نویسندگان

  • Michael Paul
  • Eiichiro Sumita
  • Luisa Bentivogli
  • Marcello Federico
چکیده

This paper investigates the feasibility of using crowd-sourcing services for the human assessment of machine translation quality of translations into non-English target languages. Non-expert graders are hired through the CrowdFlower interface to Amazon’s Mechanical Turk in order to carry out a ranking-based MT evaluation of utterances taken from the travel conversation domain for 10 Indo-European and Asian languages. The collected human assessments are analyzed for their worker characteristics, evaluation costs, and quality of the evaluations in terms of the agreement between non-expert graders and expert/oracle judgments. Moreover, data quality control mechanisms including “locale qualification” “qualificatio testing”, and “on-the-fl verification are investigated in order to increase the reliability of the crowd-based evaluation results.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Evaluation of Machine Translation Output for an Unknown Source Language: Report of an ISLE-Based Investigation

It is often assumed that knowledge of both the source and target languages is necessary in order to evaluate the output of a machine translation (MT) system. This paper reports on an experimental evaluation of Chinese-English MT and Spanish-English MT from output specifically designed for evaluators who do not read or speak Chinese or Spanish. An outline of the characteristics measured and eval...

متن کامل

Crowd-based Evaluation of English and Japanese Machine Translation Quality

This paper investigates the feasibility of using crowd-sourcing services for the human assessment of machine translation quality of English and Japanese translation tasks. Nonexpert graders are hired in order to carry out a ranking-based MT evaluation of utterances taken from the domain of travel conversations. Besides a thorough analysis of the obtained non-expert grading results, data quality...

متن کامل

Edinburgh SLT and MT System Description for the IWSLT 2014 Evaluation

This paper describes the University of Edinburgh’s spoken language translation (SLT) and machine translation (MT) systems for the IWSLT 2014 evaluation campaign. In the SLT track, we participated in the German↔English and English→French tasks. In the MT track, we participated in the German↔English, English→French, Arabic↔English, Farsi→English, Hebrew→English, Spanish↔English, and Portuguese-Br...

متن کامل

The Correlation of Machine Translation Evaluation Metrics with Human Judgement on Persian Language

Machine Translation Evaluation Metrics (MTEMs) are the central core of Machine Translation (MT) engines as they are developed based on frequent evaluation. Although MTEMs are widespread today, their validity and quality for many languages is still under question. The aim of this research study was to examine the validity and assess the quality of MTEMs from Lexical Similarity set on machine tra...

متن کامل

English-Japanese Example-Based Machine Translation Using Abstract Linguistic Representations

This presentation describes an examplebased English-Japanese machine translation system in which an abstract linguistic representation layer is used to extract and store bilingual translation knowledge, transfer patterns between languages, and generate output strings. Abstraction permits structural neutralizations that facilitate learning of translation examples across languages with radically ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2012